explanation technique
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Macao (0.04)
- Research Report > New Finding (0.46)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
AttCAT: Explaining Transformers via Attentive Class Activation Tokens
Transformers have improved the state-of-the-art in various natural language processing and computer vision tasks. However, the success of the Transformer model has not yet been duly explained. Current explanation techniques, which dissect either the self-attention mechanism or gradient-based attribution, do not necessarily provide a faithful explanation of the inner workings of Transformers due to the following reasons: first, attention weights alone without considering the magnitudes of feature values are not adequate to reveal the self-attention mechanism; second, whereas most Transformer explanation techniques utilize self-attention module, the skip-connection module, contributing a significant portion of information flows in Transformers, has not yet been sufficiently exploited in explanation; third, the gradient-based attribution of individual feature does not incorporate interaction among features in explaining the model's output. In order to tackle the above problems, we propose a novel Transformer explanation technique via attentive class activation tokens, aka, AttCAT, leveraging encoded features, their gradients, and their attention weights to generate a faithful and confident explanation for Transformer's output. Extensive experiments are conducted to demonstrate the superior performance of AttCAT, which generalizes well to different Transformer architectures, evaluation metrics, datasets, and tasks, to the baseline methods.
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Macao (0.04)
- Research Report > New Finding (0.46)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
GDLNN: Marriage of Programming Language and Neural Networks for Accurate and Easy-to-Explain Graph Classification
Jeon, Minseok, Park, Seunghyun
GDLNN combines a domain-specific programming language, called GDL, with neural networks. The main strength of GDLNN lies in its GDL layer, which generates expressive and interpretable graph representations. Since the graph representation is interpretable, existing model explanation techniques can be directly applied to explain GDLNN's predictions. Our evaluation shows that the GDL-based representation achieves high accuracy on most graph classification benchmark datasets, outperforming dominant graph learning methods such as GNNs. Applying an existing model explanation technique also yields high-quality explanations of GDLNN's predictions. Furthermore, the cost of GDLNN is low when the explanation cost is included. In graph classification, graph representation learning holds the key to success. The goal of this task is to learn useful feature representations (embeddings) for the entire graph that effectively capture its key structure and properties. These representations are utilized in various graph machine learning tasks, including graph classification. Beyond predictive performance, learning interpretable graph representations has become increasingly important, as it directly impacts model explainability, crucial in decision-critical domains such as drug discovery (Kakkad et al., 2023).
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
SHLIME: Foiling adversarial attacks fooling SHAP and LIME
Chauhan, Sam, Duguet, Estelle, Ramakrishnan, Karthik, Van Deventer, Hugh, Kruger, Jack, Subbaraman, Ranjan
Post hoc explanation methods, such as LIME and SHAP, provide interpretable insights into black-box classifiers and are increasingly used to assess model biases and generalizability. However, these methods are vulnerable to adversarial manipulation, potentially concealing harmful biases. Building on the work of Slack et al. (2020), we investigate the susceptibility of LIME and SHAP to biased models and evaluate strategies for improving robustness. We first replicate the original COMPAS experiment to validate prior findings and establish a baseline. We then introduce a modular testing framework enabling systematic evaluation of augmented and ensemble explanation approaches across classifiers of varying performance. Using this framework, we assess multiple LIME/SHAP ensemble configurations on out-of-distribution models, comparing their resistance to bias concealment against the original methods. Our results identify configurations that substantially improve bias detection, highlighting their potential for enhancing transparency in the deployment of high-stakes machine learning systems.
- Information Technology > Security & Privacy (0.42)
- Government > Military (0.42)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
ExplainBench: A Benchmark Framework for Local Model Explanations in Fairness-Critical Applications
As machine learning systems are increasingly deployed in high-stakes domains such as criminal justice, finance, and healthcare, the demand for interpretable and trustworthy models has intensified. Despite the proliferation of local explanation techniques, including SHAP, LIME, and counterfactual methods, there exists no standardized, reproducible framework for their comparative evaluation, particularly in fairness-sensitive settings. We introduce ExplainBench, an open-source benchmarking suite for systematic evaluation of local model explanations across ethically consequential datasets. ExplainBench provides unified wrappers for popular explanation algorithms, integrates end-to-end pipelines for model training and explanation generation, and supports evaluation via fidelity, sparsity, and robustness metrics. The framework includes a Streamlit-based graphical interface for interactive exploration and is packaged as a Python module for seamless integration into research workflows. We demonstrate ExplainBench on datasets commonly used in fairness research, such as COMPAS, UCI Adult Income, and LendingClub, and showcase how different explanation methods behave under a shared experimental protocol. By enabling reproducible, comparative analysis of local explanations, ExplainBench advances the methodological foundations of interpretable machine learning and facilitates accountability in real-world AI systems.
- Law (0.49)
- Health & Medicine (0.49)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Uncovering the Structure of Explanation Quality with Spectral Analysis
Maeß, Johannes, Montavon, Grégoire, Nakajima, Shinichi, Müller, Klaus-Robert, Schnake, Thomas
As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been proposed to assess quality of explanations. However, their practical applicability remains unclear, in particular due to a limited understanding of which specific aspects each metric rewards. In this paper we propose a new framework based on spectral analysis of explanation outcomes to systematically capture the multifaceted properties of different explanation techniques. Our analysis uncovers two distinct factors of explanation quality-stability and target sensitivity-that can be directly observed through spectral decomposition. Experiments on both MNIST and ImageNet show that popular evaluation techniques (e.g., pixel-flipping, entropy) partially capture the trade-offs between these factors. Overall, our framework provides a foundational basis for understanding explanation quality, guiding the development of more reliable techniques for evaluating explanations.
- Europe > Germany > Berlin (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- (4 more...)